AITopics | attention-based model

Collaborating Authors

attention-based model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attention-Based Models for Speech Recognition

Jan K. Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Neural Information Processing SystemsOct-2-2025, 00:28:32 GMT

Neural Information Processing Systems http://nips.cc/

attention mechanism, mechanism, utterance, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
Europe > Germany > Bremen > Bremen (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Attention-Based Models for Speech Recognition

Neural Information Processing SystemsAug-12-2025, 21:33:37 GMT

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis and image caption generation. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation reaches a competitive 18.6\% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18\% PER in single utterances and 20\% in 10-times longer (repeated) utterances. Finally, we propose a change to the attention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6\% level.

attention mechanism, attention-based model, name change, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?

Jahan, Sigma, Rahman, Mohammad Masudur

arXiv.org Artificial IntelligenceJun-10-2025

As attention-based deep learning models scale in size and complexity, diagnosing their faults becomes increasingly challenging. In this work, we conduct an empirical study to evaluate the potential of Hessian-based analysis for diagnosing faults in attention-based models. Specifically, we use Hessian-derived insights to identify fragile regions (via curvature analysis) and parameter interdependencies (via parameter interaction analysis) within attention mechanisms. Through experiments on three diverse models (HAN, 3D-CNN, DistilBERT), we show that Hessian-based metrics can localize instability and pinpoint fault sources more effectively than gradients alone. Our empirical findings suggest that these metrics could significantly improve fault diagnosis in complex neural architectures, potentially improving software debugging practices.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.07871

Country:

Europe > Norway > Central Norway > Trøndelag > Trondheim (0.05)
North America > Canada (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Inferring genotype-phenotype maps using attention models

Rijal, Krishna, Holmes, Caroline M., Petti, Samantha, Reddy, Gautam, Desai, Michael M., Mehta, Pankaj

arXiv.org Artificial IntelligenceApr-15-2025

Predicting phenotype from genotype is a central challenge in genetics. Traditional approaches in quantitative genetics typically analyze this problem using methods based on linear regression. These methods generally assume that the genetic architecture of complex traits can be parameterized in terms of an additive model, where the effects of loci are independent, plus (in some cases) pairwise epistatic interactions between loci. However, these models struggle to analyze more complex patterns of epistasis or subtle gene-environment interactions. Recent advances in machine learning, particularly attention-based models, offer a promising alternative. Initially developed for natural language processing, attention-based models excel at capturing context-dependent interactions and have shown exceptional performance in predicting protein structure and function. Here, we apply attention-based models to quantitative genetics. We analyze the performance of this attention-based approach in predicting phenotype from genotype using simulated data across a range of models with increasing epistatic complexity, and using experimental data from a recent quantitative trait locus mapping study in budding yeast. We find that our model demonstrates superior out-of-sample predictions in epistatic regimes compared to standard methods. We also explore a more general multi-environment attention-based model to jointly analyze genotype-phenotype maps across multiple environments and show that such architectures can be used for "transfer learning" - predicting phenotypes in novel environments with limited training data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.10388

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Medford (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Adaptive Attention-Based Model for 5G Radio-based Outdoor Localization

Yaman, Ilayda, Tian, Guoda, Tufvesson, Fredrik, Edfors, Ove, Zhang, Zhengya, Liu, Liang

arXiv.org Artificial IntelligenceMar-31-2025

Radio-based localization in dynamic environments, such as urban and vehicular settings, requires systems that can efficiently adapt to varying signal conditions and environmental changes. Factors such as multipath interference and obstructions introduce different levels of complexity that affect the accuracy of the localization. Although generalized models offer broad applicability, they often struggle to capture the nuances of specific environments, leading to suboptimal performance in real-world deployments. In contrast, specialized models can be tailored to particular conditions, enabling more precise localization by effectively handling domain-specific variations and noise patterns. However, deploying multiple specialized models requires an efficient mechanism to select the most appropriate one for a given scenario. In this work, we develop an adaptive localization framework that combines shallow attention-based models with a router/switching mechanism based on a single-layer perceptron (SLP). This enables seamless transitions between specialized localization models optimized for different conditions, balancing accuracy, computational efficiency, and robustness to environmental variations. We design three low-complex localization models tailored for distinct scenarios, optimized for reduced computational complexity, test time, and model size. The router dynamically selects the most suitable model based on real-time input characteristics. The proposed framework is validated using real-world vehicle localization data collected from a massive MIMO base station (BS), demonstrating its ability to seamlessly adapt to diverse deployment conditions while maintaining high localization accuracy.

artificial intelligence, machine learning, scenario, (17 more...)

arXiv.org Artificial Intelligence

2503.2381

Country:

Europe > Sweden (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report (0.40)

Industry: Telecommunications (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback

Robust Neural Processes for Noisy Data

Shapira, Chen, Rosenbaum, Dan

arXiv.org Artificial IntelligenceNov-3-2024

Models that adapt their predictions based on some given contexts, also known as in-context learning, have become ubiquitous in recent years. We propose to study the behavior of such models when data is contaminated by noise. Towards this goal we use the Neural Processes (NP) framework, as a simple and rigorous way to learn a distribution over functions, where predictions are based on a set of context points. Using this framework, we find that the models that perform best on clean data, are different than the models that perform best on noisy data. Specifically, models that process the context using attention, are more severely affected by noise, leading to in-context overfitting. We propose a simple method to train NP models that makes them more robust to noisy data. Experiments on 1D functions and 2D image datasets demonstrate that our method leads to models that outperform all other NP models for all noise levels.

artificial intelligence, data quality, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2411.0167

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Revealing and Mitigating the Local Pattern Shortcuts of Mamba

You, Wangjie, Tang, Zecheng, Li, Juntao, Yao, Lili, Zhang, Min

arXiv.org Artificial IntelligenceOct-21-2024

Large language models (LLMs) have advanced significantly due to the attention mechanism, but their quadratic complexity and linear memory demands limit their performance on long-context tasks. Recently, researchers introduced Mamba, an advanced model built upon State Space Models(SSMs) that offers linear complexity and constant memory. Although Mamba is reported to match or surpass the performance of attention-based models, our analysis reveals a performance gap: Mamba excels in tasks that involve localized key information but faces challenges with tasks that require handling distributed key information. Our controlled experiments suggest that this inconsistency arises from Mamba's reliance on local pattern shortcuts, which enable the model to remember local key information within its limited memory but hinder its ability to retain more dispersed information. Therefore, we introduce a global selection module into the Mamba model to address this issue. Experiments on both existing and proposed synthetic tasks, as well as real-world tasks, demonstrate the effectiveness of our method. Notably, with the introduction of only 4M extra parameters, our approach enables the Mamba model(130M) to achieve a significant improvement on tasks with distributed information, increasing its performance from 0 to 80.54 points.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.15678

Genre:

Research Report > Experimental Study (0.88)
Research Report > Strength High (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.67)

Add feedback

Reviews: Latent Alignment and Variational Attention

Neural Information Processing SystemsOct-8-2024, 01:41:46 GMT

Update based on author rebuttal: I believe the authors have addressed the main criticisms of this paper (not clear how it's different from prior work) and have also provided additional experiments. I've raised my score accordingly. This paper focuses on using variational inference to train models with a "latent" (stochastic) attention mechanism. They consider attention as a categorical or dirichlet random variable, and explore using posterior inference on alignment decisions. They test various approaches on NMT and visual question answering datasets.

attention-based model, domain structure, latent alignment and variational attention, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback